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Astrophysical sources are now observed by many different instruments at different wavelengths, 
from radio to high-energy gamma-rays, with an unprecedented quality. Putting all these data to¬ 
gether to form a coherent view, however, is a very difficult task. Each instrument has its own data 
format, software and analysis procedure, which are difficult to combine. It is for example very 
challenging to perform a broadband fit of the energy spectrum of the source. The Multi-Mission 
Maximum Likelihood framework (3ML) aims to solve this issue, providing a common frame¬ 
work which allows for a coherent modeling of sources using all the available data, independent of 
their origin. At the same time, thanks to its architecture based on plug-ins, 3ML uses the existing 
official software of each instrument for the corresponding data in a way which is transparent to 
the user. 3ML is based on the likelihood formalism, in which a model summarizing our knowl¬ 
edge about a particular region of the sky is convolved with the instrument response and compared 
to the corresponding data. The user can choose between a frequentist analysis, and a Bayesian 
analysis. In the former, parameters of the model are optimized in order to obtain the best match 
to the data (i.e., the maximum of the likelihood). In the latter, the priors specified by the user are 
used to build the posterior distribution, which is then sampled with Markov Chain Monte Carlo 
or Multinest. Our implementation of this idea is very flexible, allowing the study of point sources 
as well as extended sources with arbitrary spectra. We will review the problem we aim to solve, 
the 3ML concepts and its innovative potential. 
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1. Introduction 

Multi-wavelength data contain fundamental physical information about the nature of astro- 
physical sources, the emission mechanisms, and information about the Universe at large (cosmol¬ 
ogy), representing a big opportunity for the advancement of astrophysics. This is widely rec¬ 
ognized, as also reported in the NASA Decadal Survey [1]. However, instruments at different 
wavelengths are based on different technologies, requiring different handling, software, skills and 
resources. A joint analysis is hence difficult and sometimes impossible. The Multi-Mission Maxi¬ 
mum Likelihood framework (3ML) aims to solve this problem. In order to understand its innovative 
approach, we need to look at the observation process and how the problem we just presented has 
been solved in the past. 

The process of observation is represented in Fig. 1 . The sky {reality) is observed by a telescope, 
which collects photons, measure their properties (direction, energy etc.) and save them in a data 
set. In a certain sense, the observation process brings the signal from the reality domain to the data 
domain. It also necessarily adds noise to the data. We call irreducible the noise inherent to the 
observation, such as Poisson noise when counting photons. We instead call background the noise 
coming from the apparatus or the environment (detector noise, light pollution, or sources other than 
the one under study). In order to understand an astrophysical source, we need to connect theoretical 
models, which describe reality and live in that domain, to the data domain. Let us now formalize 
the problem. In the following upper-case and lower-case letters indicate respectively quantities in 
the reality and in the data domain. We also express area in cm^, energy in keV, time in seconds 
and solid angle in steradians. Be S = S{E,P) the true differential flux of photons coming from the 
sky position P at the energy E, in units of photons cm^^ s^^ keV^^ sr^^ An instrument, which is 
the bridge between reality and data, is characterized by its response R = R{e,E,p,P), which is a 
function of the true energy E and position P, and of the reconstructed energy e and position p. We 
call energy dispersion the phenomenon for which R{e >0 for e / F, while the phenomenon 

for which 7? (•, •, p, P) > 0 for p / P is related to the optical properties of the instrument, summarized 
by the so-called Point Spread Eunction (PSF). We assume that it is possible to factorize R as R = 
A{E,P) PSE{E,p,P) D{e,E), where: A is the geometric cross section of the detector visible by 
incident radiation coming from the direction P multiplied by the efficiency at energy E {effective 
area, in cm^); PSF is the Point Spread Function in sr^^ normalized to 1, i.e., f PSE{E,p,P) dp = \', 
and D is the energy dispersion in keV^' normalized to 1, i.e., f D{e,E) de = 1. For simplicity we 
assume that R does not vary during our observation. The differential rate of photons measured by 
our instrument at position p and energy e is: 



[photons s ^ keV ^ sr 


( 1 . 1 ) 


Typically an instrument measures radiation in / = l..m spatial bins and j = l..k energy bins of finite 
size. For example, an instrument with a CCD camera has spatial bins corresponding to the pixels 
in the camera. The number of photons detected in the i,j bin in the time interval At in the absence 
of noise is: 



( 1 . 2 ) 
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Figure 1: The observation process. 


where the first integral is performed on the space covered by the /-th spatial pixel, and ey,i and ey 2 
are the boundaries of the y-th energy bin. What we actually measure is a random variable: 

Oij =P{oij, ..) [photons], (1.3) 

where P{oi,j, ..) is a probability distribution describing our irreducible noise. It generally depends 
on Oij and possibly on other variables, such as the residual background. For example, if we are 
counting photons, F is a Poisson distribution. We can now state the problem we aim to solve: we 
have a set d of measurements o,y (data), we know*/? and the noise distribution P, and we want to 
find ouf as much as we can abouf S. This can be accomplished in fwo ways: i) bringing a model 
for S to the data domain by convolving if wifh R, and fhen comparing fhis folded model wifh fhe 
dafa fhrough for example a likelihood analysis based on fhe appropriafe noise disfribufion P, ii) on 
fhe converse bringing the signal back to the reality domain by de-convolving fhe dafa by R, and 
comparing such unfolded dafa wifh a fheorefical model. This second sfrafegy is fhe mosf common 
for mulfi-wavelengfh dafa, and we Iherefore analyze if firsl in fhe nexf secfion. 

2. Multi-wavelength analysis of a source 

For simplicify we assume in fhis secfion fhaf our insfrumenfs are poinfed on a region of fhe 
sky wifh only one poinf source (for example an AGN or a sfar) af fhe posifion P*, and we neglecf 
any residual background^. Then S oc 8k{P,Pf), where 8k is Kronecker’s della. We drop fhe P and S 
is now fhe true specfral energy disfribufion funclion for fhe poinf source. 

2.1 Bringing the signal back to the reality domain: Spectral Energy Distribution 

The problem of analyzing mulfi-wavelengfh dafa has been classically approached by using 
fhe de-convolulion process. For poinf sources fhis means building fhe so-called Specfral Energy 

'Usually we know R up to a certain degree. Our uncertainty about the response translates in systematic uncertainties, 
which are outside the scope of this paper and will be neglected here. 

^Although this is not realistic, a proper treatment of the background would complicate the notation, and it is not 
needed for the purposes of this paper. 
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Distribution, which is a plot of the differential energy flux measured at different energies (or wave¬ 
lengths). Such SED is then studied by fitting theoretical models to it. Let us consider one en¬ 
ergy bin centered on e with a very narrow band pass 5e and no energy dispersion. In this case 
D{e,E) = 5k{e,E), i.e., for every photon the measured energy is equal to the true energy. Hence, 
R = A{e,P^,) PSE{e,p,P-^), with /? > 0 if and only if e — de/2 < e < e + de/2. Then, equation 1.1 
becomes: 

n{p,e) = Sie) A{e,P^) PSE{e,p,P^). (2.1) 


Let us also use only one spatial bin covering the entire PSL of the instrument at all energies. We 
approximate eq.1.2 as: 


o ~ Af y S{e) A{e 

,P^)PSE{e,p,P^) dedp 

(2.2) 

= At S{e) de A(e. 

,P„) ^ PSE{e,p,P^) dp 

)• 

(2.3) 

= At S{e) de A{e 

(2.4) 

We then equate o to our measured counts o to build a measurement S for S{e) as: 


deAtA{e) ' 

photons cm ^ s ^ keV ^]. 

(2.5) 


A measurement for the energy flux E (e) is obtained by multiplying S by the typical energy of the 
photons in the energy band of interest and by the bandwidth de. Since de is small, the typical 
energy is e and^: 

E(e) = e de S(e) = ^. [keV cm^^ s^^]. (2.6) 

At A{e) 

If the data from all instruments can be divided in bins small enough for these approximations to 
hold, we can build the SED by repeating this procedure for all bins. Otherwise, eq. 1.1 cannot be 
simplified. This can happen for example if the energy dispersion for our instrument is large, or if 
the source is not bright enough to divide the signal in many bins. In order to proceed we are forced 
to assume a mathematical form for the function S{E). Let us write S{E) = k§{E), where we have 
singled out the normalization k and the shape S(£'). Lor example, for a power-law, ‘S>{E) = E^^. 
One way to choose such a function is to fit the entire dataset for each instrument, finding a shape 
providing a good fif^. Then: 

n{p,e) =k j §{E) [A{E,P^) PSE{E,p,P^) D{e,E)] dE (2.7) 

and (assuming again one spatial pixel covering the whole PSL): 

o = At k j S{E) [A{E,PPjPSE{E,p,P^)D{e,E)]dEdedp. (2.8) 

S acts as a weighting function for the response. We are implicitly assuming that S is valid over the 
whole range in true energy where D{e,E) > 0, with cyi < e < We could now determine k 

^The following quantity can be converted to the more typical unit of erg cm^^ ^ by multiplying it by 1.602 x 10^® 
erg keV^*. 

“^Note that using directly such a best fit to get a measurement for the differential flux at different energies is a bad 
idea, because it would make all the points obtained this way not-independent and their errors artificially small. This 
would make it very difficult to use such points when modeling the entire SED, since most fitting procedures such as 
minimization assume independent data points. 
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and the parameters of S by fitting this expression to our measured o. Unfortunately we have only 
one data point. If S = for example, we have to fix a, using the one obtained from a fit of the 
whole dataset for our instrument. We determine then a measurement k and its error. The average 
energy for the photons in our energy bin is: 


!:iES(E)dE 

lei 


(2.9) 


and finally we can take: 

F{{e)) = (e) {e 2 -ei)kS{{e)) (2.10) 


as our SED point. 

It is clear then that the procedure is cumbersome, and that our results are model-dependent, 
i.e., depend on the shape S we assumed. This dependence is strong for large energy bins and for 
instruments with large energy dispersion effects. A poor choice can bias the results introducing 
a systematic error in the measurement of the SED. The problem is particularly troubling for faint 
sources. Eirst, we cannot divide the data too much because we need a good signal-to-noise in 
each bin. Moreover, suppose that our data for one particular instrument are not good enough 
to distinguish between a shape S = and another one S' = i.e., S and S' give a 

comparably-good fit to the data. Which function shall we use as weighting? The situation is even 
more confused. After having build the SED for all the instruments - each instrument potentially 
with its own weighting function S- we then fit the entire SED with our theoretical model, which 
might differ substantially from the weighting functions used. This is incoherent and prone to errors. 
Also, this procedure is difficult to generalize to extended sources. Euckily there is a better option 
for these cases, which we will analyze in the next section. 


2.2 Bringing the model to the data domain: forward folding 

Suppose that we have a good understanding of the physical processes happening in a source, 
and we are able to write a model for the spectrum of such source ^(Elo:), where a is a vector of 
parameters whose values we want to constrain. We use eq. 1.1 and eq. 1.2 to obtain the predicted 
number of counts in any given bin (forward-folding). We can always do this, no matter the 
size of the bins or the characteristics of the instrument. We rewrite the probability function P 
introduced in eq. 1.3 as the probability P(o,y|o,y) of observing Oij photons given the expectation 

^ o’P 

Oij. Eor example, if P is a Poisson distribution, then P(o,y|o,y) = 'U , —. We do this for all our 
bins. Eet d be the collections of all the Oij (i.e., our data set), which are independent measurements. 
The joint probability of obtaining d given S and a is: 

L{d\S,a) = Y[Pioij\oij). (2.11) 

ij 

This is called likelihood function, and expresses the probability of obtaining cf if ^(E, a) is the true 
model. We can now look for the set OCmle which maximizes the likelihood (Maximum Likelihood 
Estimation, or MEE, Bohm & Zech [2]), which is our best fit to the data. Computationally it is 
more convenient to minimize the function — log (E) instead of maximizing L, but clearly otMLE 
which minimizes — log (E) is the same which maximizes E. We can build a likelihood function 
for any choice of P. Eor example, the usual function is (twice) the log-likelihood function 
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when P is a Gaussian distribution. Also, we can make a combined likelihood for datasets with 
different P distributions by simply summing the corresponding log-likelihood functions. If we 
have competing models for the same source, or if we do not know which S to use, we can explore 
different solutions and use the likelihood function for model selection (for example through the 
Likelihood Ratio Test). L can also be used to estimate the goodness-of-fit for our model. Unfortu¬ 
nately both these operations often require extensive Monte Carlo simulations. We can also combine 
L with prior knowledge and perform Bayesian inference, which also supplies alternative ways for 
model selection. The likelihood approach allows to study multiple sources at once, even extended 
sources, as long as we are able to formulate a function S{P,E). Hence forward-folding combined 
with likelihood or Bayesian analysis is a powerful tool which is always applicable, even when a 
model-independent SED is impossible or when we are dealing with many or extended sources. 

3. The Multi-Mission Maximum Likelihood framework (3ML) 

The forward-folding approach described in the previous section has been already adopted in 
the X-ray community. Software such as XSPEC [3], ISIS [4] or Sherpa [5] adopt such formalism. 
Unfortunately they handle only the OGIP format for the data^, which has been designed for X-ray 
data. This format is not well suited for observatories where the spatial information on single events 
is important. Eor instance, let us consider the Eermi Earge Area Telescope [6]. EAT is a gamma-ray 
telescope, sensitive from 20 MeV to over 300 GeV, with imaging capabilities. Eet us consider the 
case of a Gamma-Ray Burst, with a duration < few minutes. The typical background for the EAT 
is very low and a spatial PSE-like cluster of few events can constitute a solid detection on such 
a time scale. Accordingly, an appropriate likelihood function for EAT data must account for the 
spatial distribution of the events. To illustrate this point, we simulated a EAT observation of 100 s 
of a point source and a typical background (Eig. 2). There are 45 background counts and 15 source 
counts. The significance of the point source obtained with the ad-hoc EAT likelihood software, 
which accounts for the spatial structure of the data, is more than la. If we ignore that spatial 
structure, the Poisson probability of obtaining 45 + 15 = 60 counts or more when expecting 45 is 
2%, coiTesponding to less than 2 a. We can now see how transforming the data in OGIP format, 
which requires integrating over the spatial dimension, results in a big loss of sensitivity. A similar 
but even more extreme case is represented by the data from the High-Altitude Water Cerenkov 
experiment (HAWC), an array of water Cherenkov detectors instrumented with photomultipliers 
(PMTs), sensitive from 100 GeV to 100 TeV [7]. On top of the problem with the spatial resolution, 
which is important for HAWC as well, the latter also has a poor energy resolution due to technolog¬ 
ical limitations. Eor this reason, its response R is better formulated as a function which translates 
true energies for the incoming photon in number of hits recorded by the PMTs (nHits) instead of 
observed energy [see 8, for details]. Encoding such a response in the OGIP format is impossible. 
Similar problems are present for other high-energy observatories, such as VERITAS, HESS and 
MAGIC. Moreover, it is of course impossible to study the morphology of an extended source such 
as a Supernova Remnant using OGIP. These problems motivated us to create the Multi-Mission 
Maximum Eikelihood framework. 

^http://heasarc.gsfc.nasa.gov/docs/heasarc/ofwg/doc s/spectra/ogip_92_007/ogip_92_007.html 
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Figure 2: Left panel; simulation of a faint point source with Fermi/LAT. Right panel; an example of a 
likelihood maximization with 3ML. 


The need for flexibility described above led us to create a software architecture based on plug¬ 
ins, each one handling the data from one instrument. Each operating telescope has already its own 
official software (OS), developed for that observatory only. A plugin is an interface between 3ML 
and the OS of one instrument; it receives from the framework the likelihood model 
where a, is a particular set of parameters. It then uses the OS to compute the value for the likeli¬ 
hood function appropriate for that instrument for the set d^. This architecture guarantees the best 
treatment of the data, avoids placing any constraint on how data should be formatted or treated, 
and minimizes the effort for including a new instrument capitalizing on the existing OS. To illus¬ 
trate this point, let us consider a simple analysis for a single point source, using data from HAWC 
and Fermi/LAT. The analysis flow is shown in the right panel of Fig. 2: a model definition S with 
its parameters and initial values ds are fed in 3ML. The model values S(£', 1%) are passed to the 
plugins, which compute the likelihood values using the OSs of the two instruments. These values 
are then summed and passed to the fitting engine (MIGRAD in this case) which decides if the fit 
has converged. If not, new trial values are generated and the loop restarted. The key point 

here is that 3ML does not need to know how the plug-ins uses the OSs to compute the respective 
likelihood. Hence, each OS is free to use the language, the data format, the analysis procedure and 
the likelihood formula which best fits its needs. As clear from the formalism presented in the intro¬ 
duction, the likelihood model S can contain as many point sources as needed, and extended sources 
as well. For the first time, then, 3ML allows multi-wavelength modeling of extended sources. This 
is particularly interesting for combining instruments with wide field of view but comparably coarse 
spatial resolution with instruments with smaller FOV but higher spatial resolution, such as for ex¬ 
ample HAWC and VERITAS. Our architecture allows also to naturally account for the so-called 
nuisance parameters. For instance, let us consider again the Fermi/HAWC combined analysis. The 
background for FAT is made up of two components, the Galactic and the isotropic background 
component, which are known up to a normalization constant which must be fitted to the data. The 
background in HAWC is different and has its own normalization. These background normalizations 
are not of immediate interest for the study of the source, but must be taken into account in the analy- 
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sis. Parameters like these are called nuisance parameters. We divide a = {as,dCLAT,&HAWc)^ where 
CCs are the parameters for the source, while CClat and CChawc are parameters for the background in 
LAT and HAWC respectively. While cCs plays a role in the likelihood for both instruments, CClat 
does not play a role in the HAWC likelihood, nor OChawc in the LAT likelihood. We can write a 
profile likelihood [2] logwhich only depends on oCg, with: 

-logL,o((a^) =min[-logLLA7’(a^,aLA7’)]+ min [-logL// awc( 4,a//Awc)], (3.1) 

®LAr O-hAWC 

where minx [f{x,y)] denotes the minimum of / with respect to y with x fixed. Practically, the fitting 
engine only works on the CCg. For each trial set oCg the LAT plugin executes a inner fit in which 
the LAT likelihood is maximized with respect to CClat with oCg fixed, and fhe HAWC plugin does 
fhe same for OChawc- In ofher words fhe filling engine maximizes fhe profile likelihood in which 
CClat and OChawc are profiled out by fhe plugins. We can also deal wilh inler-calibralion belween 
inslrumenls, inlroducing a normalizafion conslanl for each inslrumenl as a nuisance parameter. 

In conclusion, fhe archileclure of 3ML provides fhe flexibilily needed for mulli-wavelenglh 
sludies. If is even possible lo implemenl a mulli-messenger analysis, where dala from neulrino or 
gravilalional wave experimenls are combined wilh eleclromagnelic dala. As long as our model S 
can make prediclions abouf neulrino or GW fluxes, plugins for such inslrumenls could provide Ihe 
corresponding likelihood. 3ML is open-source^. Il has been devised in Ihe conlexl of a coordination 
efforl belween Ihe Fermi, VERITAS and HAWC collaboration. Ils developmenl, as well as plugins 
for Fermi/LAT, Fermi/GBM, Swifl/XRT, Swifl/BAT, Swifl/UVOT, HAWC, VERITAS, HESS and 
optical telescopes wilh differenl filters are on-going. We will also provide a plugin for OSs based 
on GammaEib’, a toolbox for developing likelihood-based analysis for gamma-ray inslrumenls 
which will likely be adopted by new experimenls such as CTA. 
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